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We derive and study optimal and nearly-optimal strategies for the detection of sinusoidal signals 
hidden in additive (Gaussian and non-Gaussian) noise. Such strategies are an essential part of 
algorithms for the detection of the gravitational Continuous Wave (CW) signals produced by pulsars. 
Optimal strategies are derived for the case where the signal phase is not known and the product of 
the signal frequency and the observation time is non-integral. 
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I. INTRODUCTION 

A key problem in data analysis is to detect sinusoidal 
signals in noise. Such signals are often called "lines" or 
"peaks" because in the Fourier domain (frequency space) 
they appear as spikes (line-like features) or sharp narrow 
peaks in the energy spectrum of the signal. When the 
signal is large compared to the noise, such signals are 
easy to identify. When they are weak, the identification 
becomes more difficult. 

The work in this paper was motivated by the devel- 
opment of algorithms to search for Continuous Wave 
(CW) signals in the new generation of intcrferometric 
gravitational-wave detectors which are either under con- 
struction jj], ||, ||, ||, HI or planned || . These signals are 
produced by rapidly spinning neutron stars (pulsars). 

To search for new (previously undetected) pulsars re- 
quires a search over possible sky positions, frequencies, 
and pulsar spin-down parameters. The parameter space 
is very large and these searches are computationally very 
intensive. Moreover the searches will be looking for sig- 
nals that are (statistically) at the lower limit of detection 
sensitivity 0. 

A brute-force approach (optimally filtering for all pos- 
sible source parameters) requires unrealistic computa- 
tional resources (Petaflops), so more sophisticated hier- 
archical approaches have been proposed. When the pa- 
rameter space is very large, these approaches retain much 
or all of the sensitivity of the brute-force approach but 
require less computational resources. This is possible be- 
cause, in the brute-force approach, the number of grid 
points in parameter space is so large that the detection 
threshold must be set very high to avoid false alarms and 
enable confident detection. A hierarchical search visits 
fewer points in parameter space: it ignores those below 
the (high) threshold that one must set in order to gain the 
necessary detection confidence while examining a large 
parameter space. In other words a hierarchical search 
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method does not "waste" precious computational cycles 
examining regions in parameters space where, even if a 
signal were present, it would not be detected confidently 
enough. 

The hierarchical search techniques ||, [)[ [l0| [ll| all in- 
volve a second (so-called incoherent) stage. This stage is 
called "incoherent" because it uses spectral rather than 
amplitude data. If one neglects polarization, in all of 
the proposed approaches a putative signal at the second 
stage would (effectively) appear in a spectrum as a si- 
nusoidal signal at fixed frequency and phase. The third 
stage of the search works only on the regions in param- 
eter space where significant spectral lines were identified 
in the second stage. 

Our paper addresses the problem of identifying these 
candidates, that is "registering" candidate sinusoidal sig- 
nals. The analysis makes use of the Neyman- Pearson cri- 
teria to identify the "best" statistic to use for such iden- 
tification. In some cases, the best statistic depends upon 
the expected amplitude of the signal, which is unknown. 
In these cases, we have used locally-optimal methods to 
identify the best statistic in the weak-signal limit. 

The analysis is complicated by several factors: 

• The signal frequency and phase are not known in 
advance. 

• The signal frequency may not lie at an integer mul- 
tiple of the Rayleigh frequency T~ x . A signal of 
this type does not make an integer number of cy- 
cles during the observation time T. We call such 
frequencies, and the corresponding signal, "unre- 
solved" . 

• The signal frequency must be identified with reso- 
lution less than ±(2T) -1 , i.e., to within the nearest 
frequency bin. 

• The method must handle non-Gaussian noise in an 
optimal manner. 

The analysis presented here addresses all of these con- 
cerns. 
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II. PROBLEM DESCRIPTION AND OPTIMAL 
STATISTICS 



The basic problem that we consider is the following. 
We are given N samples of a time-domain data stream, 
sampled at discrete times t = tj = jAt. We denote this 
data by yj for j = 0, 1, • • • , N — 1. The total observation 
time is T = N At. The question that we want to answer 
is, does the data stream yj contain a sinusoidal signal 
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of constant amplitude and frequency? To address this 
question, we make use of the theory of optimal signal 
detection. It is convenient to recast the problem in the 
Fourier domain. Denote the Discrete Fourier Transform 
(DFT) of the data@ by x k : 



Xk = 



N-i 

E 



Vj 



^2-nijk/N 



for fc=-JV/2+l,— ,N/2. (2.2) 



Since this transformation is invertible, any question or 
statement about the y's can also be stated in terms of 
the x's, hence we will often use the term "data" to refer 
to the x's rather than to the t/'s. Here, and elsewhere, 
the symbols x and y without indices refer to the collective 
ensemble of all the data. For convenience we will assume 
that N is a power of two. The index k will often be 
referred to as a "frequency bin". The frequencies that 
these bins correspond to, 

h = ~N~At = T (2 - 3) 
are called "resolved frequencies" for reasons that will be- 
come clear later. 

In what follows, we will assume that the data y is 
real. In this *_ k where * denotes complex- 

conjugate, and both xq and £jv/2 are real. The data 
set y is then exactly equivalent to the set of x^ for 
k = 0, • • • , N/2. To simplify the mathematics, we will 
assume that the average value of the y's vanishes (i.e., 
that the DC or average value has been removed from the 
data) so that xq — 0. We will also assume that there 
is no energy at the Nyquist frequency Jn/2 (which in a 
real experiment would be enforced by appropriate anti- 
aliasing filters) so that a;jv/2 = 0. Then, the data set y is 
exactly equivalent to the set Xk for k = 1, • • • , N/2 — 1. 

We use the notation p(x\e) to denote the probability 
distribution function (pdf) of the data, in the presence 
of a signal whose amplitude is e. For example, 

• if the (real and imaginary parts of the) noise in each 
frequency bin is independent and Gaussian with 
vanishing mean and unit variance, and the signal is 
a sinusoid of known phase at resolved frequency fi 
given by yj = ejj cos(27r/^ij - <f>), then 



p(x\e) 
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Note that since I is an integer, and e is real, the 
signal only affects the ^'th frequency bin. 

• if the assumptions are the same as above, but the 
phase of the signal is unknown and uniformly dis- 
tributed over the range cj) S [0, 2ir), then 
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Somewhat later, we will relax these assumptions, and 
give more general forms for p(x\e) where 

• the signal frequency is not a resolved frequency, 

• the noise is not white, and 

• the noise is not Gaussian. 

Note that the integration measure for p(x\e) is 

/ N/2— 1 „ oq />oc 
dx = Y\ / d'Rxk i dSsXk, 
t_i J —oc J— oo 

where 3? and 9 denote the real and imaginary parts. 

The problem that we wish to solve is well-known in the 
theory of signal detection. The space of possible mea- 
surements Xk for k = 1, ...,N/2 — 1 is R w " 2 . Our goal 
is to divide this space of possible measurements into two 
disjoint regions Hq and Hi, whose union is all of R N ~ 2 . 
If the observed data lies in H (the "null-hypothesis re- 
gion" ) we will conclude that no signal was present in the 
data. If the data lies in Hi , we will conclude that a signal 
was present. The problem we need to solve is this: what 
is the best choice of Hq and Hit 

The solution we chose is the Neyman-Pearson crite- 
rion: the best choice is the one that gives the lowest false 
dismissal probability for a given false alarm probability. 
The false alarm probability a is the probability that a 
signal is detected when none is present: 



dx p(x\0) ) 



(2.4) 



and the false dismissal probability /3(e) is the probability 
that a signal of amplitude e is not found: 



dx p(x\e) 



(2.5) 



xeH 



The Neyman-Pearson criteria leads immediately to the 
following rule to partition the space of possible measure- 
ments into Hq and Hi . Define the likelihood function on 
the space of possible measurements by 



A(x) 



P(x\e) 
p(x\0) 



and consider the surface A(x) = Ao = constant. The 
Neyman-Pearson criteria leads to the following choice: 
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take Hq to be the region inside this surface, and Hi to 
be the region outside this surface. The value of A that 
defines the surface determines the false alarm and false 
dismissal probabilities. 

In this paper, we will use the Neyman-Pearson crite- 
ria to define an "optimal statistic" which we will denote 
t(x). This is any function of the observed data x whose 
level surfaces are the same as the level surfaces of A(x). 
If the statistic is greater than some threshold T then we 
conclude that a signal is present, and if the statistic is 
less than the threshold T we conclude that no signal was 
present. The false alarm and false dismissal probabilities 
are functions of this threshold T: as T is increased the 
false alarm probability gets smaller, and the false dis- 
missal probability gets larger. In general this optimal 
statistic is a function of the signal amplitude e. However 
we will see that for the pulsar detection problem, where e 
is small, the optimal statistic is effectively e-independcnt. 



III. A WORKED EXAMPLE 

To help make these ideas concrete, we give a complete 
worked example, demonstrating these ideas for the sec- 
ond pdf described above: a signal of unknown phase at 
a resolved frequency fg. The pdf is 
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(3.1) 

Before continuing, it is convenient to express this in 
closed form. Writing the complex data sample xg — 
\xg\exp{iij)£) in terms of its modulus \xg\ and phase tpg, 
one has 

j_ [Zh± e -i\**-~ i *\* 



_L /'^J_ P -Kl^l 2 + £2 - 2eS^ W o, ' , )) 



2tt 7 2tt 



2tt J 2tt 

= _L c -^(M 2 +e 2 ) _L [A^Mcos^-iPt) 

2vr 2tt J v 

= ^c-^l 2 +^/ ( e M). 

Z7T 

The final integral has been expressed in terms of a mod- 
ified Bessel function Iq (r) of the first kind 



Io(r) 



1 



d9e r( 



Thus we obtain a closed form for the pdf (3.1) 
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The likelihood function is now easily found: 



A(x) 



P{x\e) 
p(x|0) 



e 2' 



Io(e\xi\). 



(3.2) 



(3.3) 



While in a general situation, the likelihood function de- 
pends upon all the different variables, in this particular 
situation it only depends upon \xg\. 

We defined an optimal statistic t to be any function 
whose level surfaces are the same as the level surfaces 
of the likelihood function A(x). In this simple situation, 
the likelihood function A(a;) = A(xj., • • • , <EjV72-i) only 
depends upon the modulus \xg\ of the amplitude in a sin- 
gle (the £'th) Fourier bin. Since it is a monotonically 
increasing function of \xt\, we can choose as an opti- 
mal statistic any monotonic function of \xg\, for example 
\xg\ or \xg\ 2 . For historical and later convenience, let us 
choose as our optimal statistic the function r = \xg\ 2 . 
This is the power in the £'th bin. The mean value of this 
statistic, the power in the €th bin, is 



/ dx r P (x\e) = / dx N 



! p(x\e) = 2- 



(3.4) 



In the absence of a signal (e = 0) both the real and 
imaginary parts of xg contribute unity. 

To complete the analysis of this example, we need to 
calculate the false alarm and false dismissal probabili- 
ties. We will define, for a given value of threshold T, the 
regions Hq and Hi by: 

H = {{xi, ■ ■ ■ ,xn/2-i) such that r = \xi\ 2 < T} , and 
Hi = {(xi, ■■■ ,xjv/ 2 _i) such that r = \x e \ 2 > T) . 

Thus our choice of statistic gives a decision rule which 
has a simple physical interpretation. If the power in bin 
I is greater than T, we conclude that a signal was present. 
If not, we conclude that no sig nal w as present. 

The false alarm probability (2^4) is easy to calculate. 
It is given by the following function of threshold T. 

a(T) = / dxp(x\0) 



dx p{x\Q) 

N/2-1 



.(j-iNkl 2 



\xe\ 2 >T 



f dx TT — < 

J\x t \*>T ^ 27r 

d^.xg I d^Xi^-e-i^ 2 

J 27T 

\xe\ 2 >T 

2-7T pOO -i 

d^t / \xt\d\xt\— e-il^l 2 

oo 



T/2 

,-r/ 2 



(3.5) 



In this calculation, the transition from the 3rd to the 4th 
line is trivial because we integrate over the all the coor- 
dinates except for xg. In going from the 4th to the 5th 
line, we have changed variables from real and imaginary 
parts, to polar coordinates. 

The false dismissal probability (|]^), which depends 
both upon the signal amplitude e and upon the value 



4 



0.8 



0.6- 
P 

0.4 



0.2- 




0.2 



0.4 



a 



0.6 



0.8 



FIG. 1: The false dismissal probability /3(T) as a function 
of the false alarm probability a(T) for different values of the 
signal amplitude e. The top curve has e = 0.2. Moving down, 
the remaining curves have e = 0.5, 1.0, 2.0, 3.0. Along each 
curve, the threshold T varies from to 8. In the bottom 
right of the graph, T = 0. The crosses mark the points where 
T = 1/2, 1,3/2, ••• ,8. For example, with a threshold T = 
5.5, if the signal amplitude is e = 3, then the false alarm 
probability is a w 6.4% and the false dismissal probability is 
13 « 20%. 



T of the decision statistic threshold, is obtained with a 
similar calculation. 
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(3.6) 



This final integral can not be evaluated in closed form. 
However it is easy to check that the limit /3(oo) = 1: if 
the threshold is set very large, then the false dismissal 
probability is unity. In a moment, we will study the be- 
havior of f3 in the weak-signal limit as e — > 0. However, 
before this, it is instructive to study the false-alarm ver- 
sus false-dismissal curves for this statistic. 

The false alarm and false dismissal curves for this opti- 
mal detection statistic are illustrated in Fig. |l|. Plotting 
(3 as a function of a provides a way of describing the op- 
timal statistic which is completely independent of the ac- 
tual choice of the statistic. pl| However, the relationship 
between the threshold T and the false alarm and false 



dismissal probability does depend upon the choice of op- 
timal statistic. Because this statistic has been chosen by 
the Neyman-Pearson criterion, any other detection statis- 
tic that we choose will have poorer performance. Thus, 
for a given signal amplitude e, and for a given false alarm 
probability a, any other detection statistic will have a 
larger false dismissal probability f3: it will lie above the 
illustrated curves. 

Our primary interest is in very weak signals. For the 
pulsar detection problem, we will have e « 0.2, and will 
be operating on the threshold of detection where a + (3 is 
only slightly smaller than unity. For such weak signals, 
it is useful to define the quantity 



7 (T) = l-a(T)-/3(T). 



(3.7) 



This may be considered either as a function of the thresh- 
old T or as a function of the false alarm probability a(T). 
This quantity 7 is the difference between the detection 
probability when a signal is present, 1 — (3, and the false 
alarm probability a. For example, for a very weak sig- 
nal, the threshold might be set for a false alarm prob- 
ability of a — 15%. The false dismissal probability for 
this weak signal might be (3 = 84%. Thus, if no signal is 
present, the threshold will be exceeded a = 15% of the 
time. If a signal is present, the threshold will be exceeded 
1 — (3 = 16% of the time. Roughly speaking, the differ- 
ence between these, 7 = 1 — a— j3 = 1%, is the probability 
of the threshold being exceeded because the signal was 
present, rather than because of the detector noise. These 
weak-signal-limit curves are shown in Fig. ||. 

In the small-e (weak signal) limit, it is easy to obtain 
an approximate closed-form for [3. By substituting the 
power series representation of the Bessel function, 
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into Eqn. (|3.6[) and integrating term-by-term, one obtains 



7=1 — a — [3 — 



-Te~ T ' 2 
4 

1 



1 + l6 (T - 4)+ 576 (T2 



12T+24) 



-— e a In a 



l--(2 + lna) 



-— — — (6 + 6 In a 

144 v 
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(3.8) 



Even at the lowest order in e (the first term in square 
brackets) this is a very good approximation, as shown by 
the dashed curves in Fig. |[ At the next order (the first 
two terms in square brackets) the approximation is in- 
distinguishable from the exact result in Fig. || - the solid 
curves. This simplifies matters enormously. Although 
the statistics of the optimal detection strategy depends 
upon the signal amplitude e, for small e, this dependence 
is simple enough to be analytically approximated. 

The detection probability plays a key role in the sig- 
nificance of an observation. A hierarchical pulsar search 
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FIG. 2: Solid curves: detection probability 7 = 1 — a — [3 
as a function of the false alarm probability a for different val- 
ues of the signal amplitude e = 0.1, • • • , 0.5 (moving up from 
the bottom curve). The crosses mark different values of the 
threshold in the same way as for Fig. Dashed curves: the 
0(e 2 ) approximation is 7 = e 2 Texp(— T/2)/4 = — e 2 alna/2. 
The 0(e 4 ) approximation to 7 is not shown because on this 
graph it is indistinguishable from the exact result (the solid 



hunts for peaks in the spectra coming from a set of n 
sequential time series. For example, suppose each time 
series of length N is one day long. Three months of such 
data would correspond to n = 120. What choice of false 
alarm probability a (or equivalently, of detection thresh- 
old T) is optimal? 

This question is easily answered. One might guess that 
the best operating point is where the detection probabil- 
ity 7=1 — a — P is maximized: in the weak signal case 
this is at a threshold of T = 2 corresponding to a false 
alarm probability a = 1/e « 36.78%. However this is 
not correct. In the absence of signal, each of the n data 
sets is independent. The probability of detecting peaks 
in p of the n data sets is the same as the probability that 
a coin will come up heads p times in n flips (if the prob- 
ability of "heads" is the false alarm probability a). This 
is given by the binomial distribution: 



probability of p peaks 



a p (l 



\n—p 



Thus, in the absence of a signal, the mean number of 
peaks is an, and its variance is a 2 = a(l — a)n. In the 
presence of a signal, the mean number of peaks registered 
is (1 — (3)n. A good way to choose a false alarm probabil- 
ity (or threshold) is to maximize the significance s. This 



(# peaks) S ig na i - (# peaks) no signal 
a 

(1 — (3)n — an 

\f 'a(l — a)n 

1 - a - (3 _ 
y/a(l - a) 

T r 

= yn 
y/a(l - a) 



(3.9) 



The significance is easily calculated as a function of either 
a or T. In the weak-signal limit, it is 



T 



jn~ 4 Ve T / 2 - 1 



1 - a 



■ln< 



The significance as a function of either T or a has a max- 
imum at the threshold value T « 3.18721 correspond- 
ing to a false alarm probability of a sw 20.3188%. The 
significance at this threshold/false alarm probability is 
s w 0.402371e 2 v / n. Note that this exhibits the expected 
y/n scaling in the number n of spetra analyzed. We have 
numerically verified that this is the optimal statistic. 



IV. 



EXAMPLE: LOCAL PEAK DETECTION 
NON-OPTIMAL STRATEGY 



Section [II found and analyzed the optimal (i.e. 
Neyman-Pearson) peak detection strategy. In this Sec- 
tion, we carry out an identical analysis of a different 
(hence non-optimal) strategy. The main purpose is to 
illustrate a side-by-side comparison of different detection 
statistics. 

We will assume that the signal and noise satisfy the 
same assumptions as in Section [II, given by Eqn. ( |3.2] ). 
There, we showed that the optimal detection strategy was 
to threshold on the power r = \xe | 2 in the I'th bin. Here, 
we adopt a different detection strategy. We will say that 
a peak has been detected if and only if the power \xe\ 2 
in the £'th bin exceeds the threshold T and is greater 
than the power in either of the neighboring frequency 
bins. This strategy looks for "local peaks" that exceed 
the threshold. 

For this peak detection strategy, the detection region 
Hi is defined by 

Hi = I (xi, ■ • ■ , Xff/2-1) such that 

\x e \ 2 > T and \x e \ 2 > \x t -i\ 2 and \x e \ 2 > |^+i| 2 |. 

In other words, the peak detection strategy is to register 
a peak if the observed data set lies in H\. The null- 
hypothesis or no-signal region Hq is the set complement 
Hq = H N ~ 2 — Hi: all points not lying in H\. 

To compare this strategy to the optimal one found 
in Section III, we calculate the false-alarm and false- 



detection curves as before, and compare them with the 
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optimal strategy. The false alarm probability is 



t (T) = fdxp(x\0) 
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\xe\ 2 >T 
\xtf > |^_i| 2 
\xe\ 2 > \x e+1 \ 2 



In these expressions, J dxk denotes /_ dtftxk J_ d^sxk- 
Putting each of the three integrals into polar coordinates 
immediately yields 



(T) = \ j^d\x e \ 2 e-l-l 2 /2 \ J d 



T 
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d\x t \- e 
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du e~ u 

T/2 



2 „-|x £ | 2 /2 

[1-e-] 



-e 2 J - e + e 2 



(4.1) 



The quantity in square brackets that appears in the inter- 
mediate steps of this calculation is simply the probability 
that bins t± \ contain less power than the _Tth bin. This 
is one minus the false alarm probability (3.5) of the op- 
timal test. 

As with the optimal test, the false alarm probability 
a(T) vanishes at large threshold T — > oo. However, un- 
like the optimal test, the false alarm probability at zero 
threshold is not unity: a(T = 0) = 1/3. This is because, 
even if the threshold vanishes, to register as a peak the 
£'th bin must contain more power than both adjacent 
bins. When no signal is present, this only happens 1/3 
of the time. 

The false dismissal probability for this non-optimal 
peak detection strategy can be calculated with the same 
methods as above. One finds 



0(T) = / dxp{x\e) 

Jxetia 
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As for the optimal statistic, this false dismissal probabil- 
ity approaches one at large threshold T — > oo. However, 
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FIG. 3: Solid curves: false-dismissal j3 versus false-alarm a 
for the non-optimal detection strategy of this Section. Moving 
down from the top, the curves correspond to signal strengths 
e = 0.2, 0.5, 1, 2, 3. Notice that the false alarm probability a 
is less than 1/3 for any value of the threshold T. For com- 
parison, the dashed curves show the optimal strategy of 
the previous Section. Notice that the optimal strategy always 
yields a lower false dismissal probability for a given false alarm 
probability. The crosses mark threshold values T = 1, 2, • • • , 8 
increasing to the left along each curve. 



unlike the optimal test , it d oes not vanish at zero thresh- 
old. Setting T = in (O) on finds that 



0(T = 0) = c- 2 / 4 - ie" 



e 2 /3 



If the signal amplitude is small e — > then (3(T = 0) — > 
2/3. There is a 2/3 probability of missing a small signal 
at zero threshold, because one of the two neighboring 
frequency bins might contain more power than bin I. 

A set of false alarm/false dismissal curves for this non- 
optimal statistic are shown in Fig. |], along with the 
same curves for the optimal statistic. Note that for a 
given signal strength and false alarm probability, the false 
dismissal probability is always lower for the Neyman- 
Pearson test. Also notice that a given value of the thresh- 
old for one test statistic does not yield the same false 
alarm probability as the same threshold value for the 
other statistic. As the false alarm probability decreases, 
the two statistics have a performance (false dismissal 
probability) that becomes increasingly similar. This is 
because at increasing values of the threshold T, fewer and 
fewer peaks are rejected because the neighboring peaks 
are larger. 

In the small-signal limit e — > 0, one can use the series 
expansion of the Bessel function to obtain analytic ex- 
pressions for the false alarm probability 0. The signal 
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Significance 


local peak 
test 

, , , , 


optimal 

statistic H 




Detection Probability 


local peak 
- / test 


optimal 
statistic 



false alarm probability a 

FIG. 4: These graphs are a comparison of two differ- 
ent peak- finding methods, in the weak signal limit (small 
e). The dashed curves correspond to the optimal (Neyman- 
Pearson) test: thresholding on the signal power. The solid 
curves correspond to the local peak test described in this 
Section. The bottom graph shows the detection probability 
7/e 2 = (1 — a — /3)/e 2 as a function of false alarm probabilit 

a. The top graph shows the significance , 7 Table 

<s 2 V a ( 1 

compares the properties of these curves. 



-a) 





Optimal Test 


Local Peak Test 


Maximum of 1 — a — f3 = 


0.1839 e 2 


0.1529 e 2 


is at threshold value T = 


2.0 


2.0 


and false alarm prob a — 


36.79% 


24.91% 


Maximum of - 


0.4024 e 2 


0.3806 e 2 


is at threshold value T = 


3.187 


3.567 


and false alarm prob a — 


20.32% 


14.14% 



TABLE I: A comparison of the optimal Neyman-Pearson de- 
tection strategy, and the sub-optimal local peak detection 
method, in the weak-signal limit. Most of these values can 
be read off Fig. ^. The top half of the table gives information 
about the maximum of 7 = 1 — a — /3, such as the value of 
the threshold at the maximum. The bottom half of the table 
gives the same information for the maximum of 



detection probability is 
1-U-/3 



7 



e 2 e -3T/2 



T I e r - e r / 2 



3 r/2 



0{e 4 



-T 



-3T/2 



+ 0(e 4 ). 



This signal detection probability can't be e xpre ssed in 
analytic form entirely in terms of a given by (4.1). How- 
ever we can plot it and compare with the identical curve 
for the optimal strategy. This is shown in Fig. ||, which 
also shows the significance as a function of the false alarm 
probability. The comparison is shown in Table ffl. 

The primary purpose of these last two Sections was to 
demonstrate how a signal detection strategy can be cho- 



sen in an optimal fashion, and how it can be compared 
to a sub-optimal strategy. In a "real world" situation, 
it may be highly desirable to apply a sub-optimal strat- 
egy, because the mathematical model of the instrumental 
noise may not be complete, and might not accurately re- 
flect its real behavior. In fact, the sub-optimal method 
discussed in this Section has only slightly poorer perfor- 
mance for the simple Gaussian noise model than the op- 
timal test, but may perform much better on "real world" 
data which has correlations between different frequency 
bins. 

In the following Section, we will apply these methods 
to develop optimal tests for the case where the sinusoidal 
signal frequency is not one of the exactly resolved fre- 
quencies fk- 



V. COMMENTS ON THE WEAK-SIGNAL 
APPROXIMATION 

In the previous Sections, we studied the validity of the 
weak-signal limit e — > 0, and made use of it when appro- 
priate. We will continue to take this limit throughout 
the paper. This brings up several interesting issues. 

These types of weak-signal approximations have been 
studied extensively under the rubric of "Locally optimal 
statistics" [|l2|. Later in this paper, they will make treat- 
ment of non-Gaussian noise models tractable. 

In practice, the weak-signal approximation is well- 
justified for the pulsar detection problem. This is dra- 
matically illustrated in Fig. g. This is a typical case: for 
e < 1/2 only the lowest-order terms in e need to be re- 
tained in order to have a good approximation. Keeping 
the next order terms as well gives an extremely good ap- 
proximation even for e w 1. Typical detectable signal 
strengths will be e ss 0.2. 

In the weak-signal limit, the pdf can be well- 
approximated by the first non- vanishing term in its Tay- 
lor series in e. The first derivative of p(x\e) w.r.t. e van- 
ishes at e = 0, because p is an even function of e. This is 
because the phase <fi of the signal is uniformly distributed 
in the range [0, 2tt). The pdf is well-approximated by 

p(x\e) = p(ar|0) + NO) + 0(e 4 ), (5.1) 

where ' denotes d/de. The likelihood function is then 
approximated by 



A(x) 



P(x\e) 
p(x\0) 



= 1 



I 2 p"{x\Q) 
2 £ p{x\Q) ■ 



(5.2) 



Thus in the weak signal case (neglecting second order 
terms in the signal amplitude e) the optimal detection 
statistic is independent of signal strength, and can be 
found from the second derivative of the pdf at zero signal 
strength. This tremendously simplifies the analysis. 

The likelihood function itself, or the likelihood function 
minus a constant can be used as the optimal statistic r 
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(for example threshold on A— 1). In the absence of signal, 
the mean value of this statistic must vanish. This follows 
immediately from the definition of A, since 



dx p(x\0) (A - 1) = dx \p(x\e) - p(x\0)] = 0. 



where the function Dm is the Dirichlet Kernel: 



(5.3) 

In the weak signal case, keeping only terms up to a given 
order (say e 2 ) in A — 1, it is easy to show that the same 
relation holds. Hence, in the absence of a signal, the 
mean value of A(x) — 1 vanishes. This will be useful 
later. 



VI. OPTIMAL DETECTION OF UNRESOLVED 
FREQUENCY SIGNALS 

We now begin to address one of our key concerns. 
The previous Sections showed how to systematically de- 
rive and characterize a detection strategy for the case 
where the weak sinusoidal signal had unknown phase, 
but where, if present, the signal's frequency precisely 
corresponded to one of the Fourier bins. We now sup- 
pose that the frequency is also a random variable, whose 
value is uniformly distributed between (// + fe-i)/2 and 
(ft + fe+i)/2. In other words, the signal of interest lies 
somewhere between a half-bin to the left and a half-bin 
to the right of the £'th frequency bin. 

Before delving into the details of the analysis, it will be 
helpful to briefly examine the appearance (in frequency 
space) of an unresolved sinusoidal signal in the absence 
of noise. Take the signal frequency to be 



fl 



NAt 



(6.1) 



where we do not assume that I is an integer (correspond- 
ing to one of the resolved frequencies). Let £ denote the 
nearest bin to /, so that 



I = £ - S for S G (- 



2' 2 J 



(6.2) 



Without loss of generality, we assume that the frequency 
fi is between DC and Nyquist, corresponding to the range 
I £ (0,N/2). In the absence of noise, the signal in the 
time domain is given by 

2 2 
y 3 = e— cos(2nfijAt -(j>)=e— cos(2njl/N - <j)). 

Substituting this into the DFT (^^) and using the sum 
of the geometric series 



N-l 



5>' = T 



1-2 



N 



(6.3) 



gives Fourier amplitudes 

x k = e [e»D N (k -l) + e-^D N (k + I)] , (6.4) 



D N (z) 



sin(Trz) 
Ns'm(irz/N) 



(6.5) 



As described following equation (2.3), the range of the 
frequency index k is 1, • • • , N/2—1. Since Dn(z) vanishes 
for all integer arguments except for zero, where its value 
is Djv(0) = 1, in the resolved-frequency case where I is 
an integer, one has Xk — for k ^ I, and xi — ee 4 *. In 
the unresolved case, the signal energy is not confined to 
the Fth bin, and forms a characteristic pattern of "side- 
lobes" in the nearby frequency bins. 

If the signal frequency is unresolved (I non- integer)) 
the optimal statistical test will not only involve data from 
the £' th bin. The adjacent frequency bins also contain 
part of the signal energy, and we will shortly find that 
the statistically optimal search also takes into account 
their content (in the sense of energy and information). 

One can simplify the form of the Dirichlet kernel with 
several approximations|l9|]. Our primary interest is to 
extract as much useful information as possible from the 
Fourier amplitudes in the bins near bin £. Because Dn(z) 
is strongly peaked at z = and falls off ~ z~ l away from 



it, one may neglect the second term in (6.4) and concen- 
trate on the first term. In addition, in practical appli- 
cations, N will be large enough (greater than 10 5 ) that 
the 1/N term in the exponential of Dm can be neglected. 
Finally, since we will be interested in the Fourier ampli- 
tudes in nearby bins, \z\ « N, which means that the 
denominator N sm(jrz/N) is well-approximated by nz. 
This leaves us with 

x k » ee^wf* - I), 
where the coefficients 



w(z) 



. sm7rz 



e l7r2 sinc(z). 



(6.6) 



Here jo is a spherical Bessel function, and we have used 
Woodward and Bracewell's definition of the sampling 
function sine. 

We now suppose that the signal of interest is dis- 
tributed, with equal probability, anywhere between ±1/2 
a frequency bin from the £'th bin, and write an expres- 
sion for the pdf of the data. If, as before, the signal 
phase (f> is a uniformly distributed random variable, and 
if the instrument noise is Gaussian and satisfies the same 
assumptions as before, one has p(x\e) = 



, ,1/2 r 2^N/2-l-e 

± dS d4> TT i-e"* 
2* J -1/2 Jo ^ k tL ^ 



\x k+ e — eu(k+6)e" t '\ 2 



(6.7) 



In this expression, which involves a product over all fre- 
quency bins, the index k has been shifted so that k = 
labels the £'th bin. 
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When searching for a signal peak in the vicinity of 
the rth bin, there are practical reasons (computational 
efficiency and algorithm structure) why it is desirable to 
use only information from (some small number of) nearby 
bins[E0[. Fortunately for us, the Neyman-Pearson criteria 
can be easily derived for this more limited information: 
we merely write down the pdf for the part of the data (the 
nearby bins) which are available to us. From this point 
on, we will assume that our search for a signal in the 
vicinity of the £'th frequency bin is restricted to 2P + 1 
bins. These are the Pth bin itself, and P frequency bins 
to its left and to its right. For this restricted data set, 
the pdf is p(x\e) = 



1 f 1 ' 2 
— / dS 

2lT J -1/2 JO 



2tt 



n 2^ 2 



\%k+l— euj(k+S)e" l '\ 2 



(6.8) 



k=-P 



One may now easily write down the likelihood function, 
and an optimal statistic, in the weak signal limit, making 
use of Eqn. (5.1) and (5.2). ft is easily verified that there 



are no terms of order e. Writing the pdf in the form 



I 

2^ 



1/2 

dS 

■1/2 JO 



W(e) 



p(x\e) 
where 

P ( i 

W(e)= \-^k+i-^{k + 5)c 1 ' 



(6.9) 



k=-P 



In 2tt 



and taking two derivatives w.r.t. e, one has 
f 



p"(x\Q) 



2tt 

p(x\0) 



,1/2 


,2w 


/ dS 




1-1/2 


Jo 



(w'(o)Y 



*d4 

2tt 



(W'(0)f 



-W"(0) 

w"({ejho) 



.1/2 

dS 

-1/2 JO 

We will do similar calculations later, in much less detail. 
The derivatives are easily evaluated: 

p 

= Y, R + «)e*fc.ll) 



k=-P 



W"{0) = 



d 2 W 



de 2 



(6.12) 



k=-P 



The integral d(f> of W'(0) 2 is evaluated by noting that for 
any complex numbers A and B 

2ir j / 



2?r 



3? (Ae*) $t (Be 1 ' 



\A\\B\ [ ^cos(0-Va)cos(0- i) B ) 

Jo 27r 

/ — [cos(^s - i>A) + cos(2<?!> - IpA - V's)] 

Jo 27r 



\A\\B 



\A\\B\cos(tJj b -^a) 



(6.13) 



Making use of this, the inner integral in (6.10) gives 

[(^'(°)) 2 + ^"W 

1 P P 
= 2^ 51 51 x r+t x r'+t.u(r + S)uj*(r' + S) 

r=-P r'=-P 
P 

£ Hr + S)\ 2 . 

r=-P 



Substituting this back into expression (|6~To| ) for the sec- 
ond derivative of the pdf yields 

p p 

= \ E 4 + tMrr>X r , +i - £ M rr . (6.14) 



r,r' — — P 



r=-P 



Here, M rr i is a (2P+ l)-dimcnsional square, symmetric, 
real, positive-defi nite matrix. Making use of the defini- 
tion of u in Eqn. (6.6) gives 



M 



M, 



1/2 



(6.15) 



dS u>(r + 5)u>*(r' + S) 

1/2 
1/2 

1)"' / d5 j Q (ir(r + 6)) JO (ir(r' + 5)) 

-1/2 



Adopting the Einstein summation convention (the re- 
peated indice s r an d r' are summed from — P to P) and 
substituting ( |6.14 ) into the weak-signal approximation 
.21) of the likelihood function one obtains 



A(x) 



2 \2 

J2 



c r+e x r *+e 



M r 



Mrr'- (6.16) 



r+i'^r'+i is 2S rr ', where 8 rr ' is 



In the absence of a signal, Eqn. (5.3) shows that the 
mean value of A — 1 must vanish. This is clearly the 
case, since under our assumptions, in the absence of a 
signal, the mean value of x* , „x. 
the Kronecker Delta. 

We note that the formalism of this Section can be triv- 
ially adapted to the case where the frequency of the signal 
lies in any desired range ±A around the ^'th bin. The 
only change is that in Eqn. ( |6.15 ) one makes the trans- 
formation 



1/2 



dS 



-1/2 



1 

2A 



A 



dS. 



(6.17) 



— A 



In the limit A — > 0, it's easy to see that Moo = 1 and 
all other components of M rr > = 0. The results are then 



identical to the resolved-frequency case of Section III 



The results of this Section can be summarized in a 



few lines. In Section III we studied the case where the 
signal frequency was exactly resolved. In this case, we 
found that the optimal statistic was the power in that 
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bin. Thresholding on this statistic gave the lowest false 
dismissal probability for a given false alarm probability. 
In this Section, after assuming that the signal frequency 
is uniformly distributed around bin £, we have found that 
the optimal statistic (in the w eak-s ignal case) is to thresh- 
old on the bilinear quantity ( 6.16 ). We can choose (from 
the value of P) how much of the data around the given 
bin to use. If P = we recover the power statistic of Sec- 
tion III. If P is larger, then additional information from 
neighboring bins also gets added, and the test performs 
better. In the following Sections, we will analyze the per- 
formance of this test, using the methods of Section IV to 
compare the optimal statistics for different values of P. 



VII. PROPERTIES OF THE MATRIX M 

Let us begin by exhibiti ng th e (2P + l)-dimensional 
matri x M, given by Eqn. ( 6.15 ). It's easy to integrate 
(6.15) to get an exact expression for the matrix in terms 
of sine- and cosine-integral functions Si and Ci. On the 
diagonal (no summation convention on n) 



M nn = 



-Si(vr(2n + 1)) - isi(7r(2n - 1)), 

7T 7T 



vr 2 (4n 2 - 1) 
and off the diagonal 

C(2m - 1) - C(2m + 1) - C(2n - 1) + C(2n + 1) 



M, 



2n 2 (n — to) 



where C(x) = Ci(7ra;) — Inx. In these equations, the range 
of the subscripts n, to is — P, ■ ■ ■ , P. 

The "central" element of M has row and column num- 
ber zero. The matrix extends away from this central el- 
ement by an amount determined by the value of P. For 
example, if P = 2 one has the 5-dimensional matrix: 



M = 0.7737 



0.0181 0.0422 

0.0422 0.1017 

-0.0169 -0.0761 

-0.0366 -0.0761 

-0.0169 -0.0366 



-0.0169 
-0.0761 
1.0000 
-0.0761 
-0.0169 



-0.0366 -0.0169 

-0.0761 -0.0366 

-0.0761 -0.0169 

0.1017 0.0422 

0.0422 0.0181 



where the O'th row and column are highlighted, and we 
have taken out an overall factor of Moo- Note that this 
matrix is invariant under reflection about both diago- 
nals, so it can be presented by listing just the P + 1- 
dimensional block of elements with non-negative row and 
column number. 

Because the matrix M is real and symmetric, it can be 
diagonalized by a similarity transformation 



D 



O^MO 



(7.1) 



where O is an orthogonal square matrix O* = O -1 , and 
D is diagonal. Because M is positive, its eigenvalues are 
all real and positive. To six decimal places of accuracy, 
for the first few values of P, the eigenvalues of M are 
given by: 



A 


= 7.82774 x 


10 


-1 


Ai 


= 1.37549 x 


10" 


_\ 


A 2 


= 1.07687 x 


10" 


_ 2 


A 


= 7.83230 x 


10" 


-1 


Ai 


= 1.64608 x 


10" 


-1 


A 2 


= 1.12358 x 


10" 


-2 


A 3 


= 8.16859 x 


10" 


_ g 


A 4 


= 1.53779 x 


10" 


_ 


Ao 


— /.oool I X 




1 


Ai 


1 7 £11 70 \s 




1 


\„ 

A2 


— 1 1 ^997 v 


1U 


2 


A3 


= 1.20531 x 


10" 


4 


A 4 


= 1.91042 x 


10" 


6 


A 5 


= 3.03979 x 


10" 


9 


A (i 


= 2.72000 x 


10" 


11 



for P = 1, 



(7.3) 



for P 



(7.4) 



for P 



(7.5) 



We will see shortly that these eigenvalues determine the 
false alarm and false dismissal probabilities for the cor- 
responding threshold statistics/tests. 

The case analyzed in Section III, where the signal fre- 
quency is resolved, and a one-point test is used, corre- 
sponds to setting P = and havi ng A n = 1. This is 
the limit when the frequency band ( 6.17 ) over which the 
signal is distributed is very small, and centered around a 
bin frequency. In the opposite limit where the frequency 
band ±A is large, the matrix M approaches something 
proportional to the identity matrix, with a large number 
of nearly-equal eigenvalues. 



VIII. PERFORMANCE OF THE OPTIMAL 
TEST FOR UNRESOLVED SIGNALS 

The situatio n we are considering is defined by the pdf 
given in Eqn. (6.7). We will suppose that we have im- 
plemented a search for sinusoidal signals (in the weak 
signal limit) using the thresholding statistic defined by 
Eqn. (6.16), for a particular value of P. We will call such 
a test the "2P + 1-point test" . For example, the "five 
point test" makes use of the data samples in the five bins 
nearest to some central bin, to determine if a sinusoidal 
signal is present in that central bin. 

Our goal is to determine the false-alarm and false- 
dismissal curves for different values of P. In this way, 
one can quantify the loss of performance that arises from 
throwing away the additional information coming from 
bins located away from the bin of interest. 

Let us first calculate the false-alarm probability for the 
2P + 1-point test. This is easy because it only involves 
the probability distribution p(a;|0) (and its second deriva- 
tive) for vanishing signal strength, which is an indepen- 
dent Gaussian in each frequency bin. We choose, as our 
optimal statistic, the quantity 



An 



7.73695 x 10" 1 for P = 0, 



(7.2) 



x'Mx 



(8.1) 



11 



where x is a vector of (frequency space) data around 
the bin of interest. This differs from A — 1 by a data- 

2 

independent constant term, 4^-, so it has the same level 
surfaces. Thus, for the 3-point test, the optimal statistic 
to threshold on would be 



0.0787 -0.0589 -0.0589 
-0.0589 0.7737 -0.0589 
-0.0589 -0.0589 0.0787 



Xe-1 
Xt+1 



In the absence of signal, each of the Xj is an independent 
random Gaussian variable with zero mean and unit vari- 
ance. Thus, if U is a unitary matrix, the column vector 
of variables Ux are also independent random Gaussian 
variables with zero mean and unit variance. Since the 
orthogonal matrix U = O -1 that diagonalizes M is uni- 
tary, the statistical properties of the optimal statistic r 
( 3T ) are the same as those of a random variable 



2P 



where each z r is an independent variable whose real and 
imaginary parts have independent Gaussian pdfs with 
zero mean and unit variance. Note that the pdf of u = 
\z r \ 2 is exponential with mean and variance equal to 2P. 

The pdf of the statistic r is easily computed using gen- 
erating functions. Suppose that r is any random variable, 
and p(r)dr is its probability density. We define the gen- 
erating function p{£) to be the expected value of e^ r 



p(0 



dTp(r)e 



This is basically the Fourier transform of the pdf. It 
makes it simple to compute the pdf of a random variable 
that is a sum of other random variables. Since 



2P 



E A - ? 



where each u r is a real random variable with pdf 
p(u)du 



for u < 

\%- u l 2 du for u > 



the generating function for the pdf of r (in the absence 
of a signal) is 



2P 



p(0 



n 

r=0 
2P r 

n 

r=0 
2P 



-u r /2 



du r — e 
2 



t-r/2 



3 i{(Ao«o + ---A2j=M2i») 



r=0 2 J ° 
2P 

H(i-2i^x r )- 1 



This closed form for the generating function p makes it 
possible to find the probability distribution of the optimal 
statistic t in the absence of a signal. 

To determine p from p, we invert the Fourier transform 



1 r°° 

27r J-oo 



This gives 



1 

2r7 



2P 

n 

r=0 



2Ar 



I 

2\ r 



(8.3) 



The integral clearly vanishes for r < 0, because the in- 
tegrand has all of its poles in the complex £-plane below 
the real-£ axis. If t < 0, the sign of the exponential 
term permits the contour of integration to be closed in 
the upper-half £-plane. Since there are then no poles 
contained inside the integration path, Cauchy's theorem 
implies that p(r) = for r < 0. 

To find a closed form for p(r) when r > 0, one must 
close the integration contour in the lower-half £-plane. 
The residue theorem then implies that p(r) is a sum over 
the resides of the poles, which are located at £ — —i/2X r . 
One obtains 



2P 



>M = E 



r=0 
2P 

E 

r=0 



-T/2X r 



2P 



2A r 



r'=0 
r'^r 

-r/2X r 



A r t 



n 



2A r 



(8.4) 



Here, we have introduced the set of 2P + 1 weights 
Co, • • • , C2P defined by 



2P / a 

n '"AT 

r'=0 v 
r'^r 



(Note: if P = then cq = 1). These weights have several 
interesting properties. In particular 



2P 



Cr = 1 , and 

r=0 

2P 2P 

c r X r = A r = M r 



(8.5) 
(8.6) 



r=0 



r=0 



These weights simplify the notation in what follows. 

The false alarm probability a{T) can now be obtained 
by straightforward integration: 



a(T) — j dr p(r) 

2P 

-r/2A r 



= E 



•7) 



r=0 



r=0 
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It follows from (fl| that a(0) = 1. 

Our calculations assume that the eigenvalues X r are 
distinct (as is the case here). If m of them were equal 
then a polyn omi al of order to — 1 in t would appear on 
the r.h.s. of (3.4) and a polyn omi al of order m — 1 in T 
would appear on the r.h.s. of (8.7). 

For concreteness, we give the numerical form of the 
false alarm functions for the first few values of P. The 
subscript on a denotes 2P + 1: the number of points used 
in the test. 



ax(T) 
as CO 

as CO 



= e 



-0.646249 T 



-0. 63875 T+0. 207097 _ -3. 6351 T-l. 46410 
, c -46.430 T-6. 73815 

e -0. 63840 T+0. 250487 _ -3. 0375 T-l. 25272 
_|_ g-44. 500 T-6. 83620 _ g-6121.0T-21.6738 



, e -325140.0 T-37.5716 

The false dismissal probability (3 is a bit more challenging 
to calculate. However for the weak-signal case of interest, 
it is still possible. 

To find false dismissal probability (3 we begin by writ- 
ing the pdf for the weak signal case as 

p(x\e) = p(x\0) + ^eV'(z|0) 

/ \ \ ( 1 op"(x\0) 

= p(l|0) ^1 + ^e 2 QiE* + ^M rT .'X r ' + ^ - Mr^j 

= p(*|0)(l + ^g- 



where r is the optimal statistic ( |S.l|) . From this, we 
can immediately write an expression for the generating 
function of p(r\e) to lowest order in e, 



2P - ,>oo -i 
r=0 lJ ° 1 



-Ur/2 



where as before r = Aoito + • • • ^2PU2P- Since differenti- 
ating w.r.t. £ brings down a factor of it, one has 



i 



1 d 



p(C|0). (8. 



This relation is easily inverted to find a lowest-order for- 
mula for p(r\e). We simply integrate the new term by 
parts: 



d£e" 



1 

2^ i-oo d£ 
tp(t) = rp(r|0) 



— OO 

00 d 



-2 . 1/2 


> 






£~ 2 y=e~*(l-a-P) 









0.4 0.6 0.E 

False alarm probability a 



FIG. 5: Bottom four curves: The detection probability 



'(1 



f3) is plotted as a function of the false 



alarm probability a, for the 1,3,5, and 7-point optimal tests 
defined by Eqn. (p.l|), in the weak-signal limit. While us- 
ing the additional information in the neighboring bins does 
improve the detection probability, the improvement is slight. 
Top four curves: The significance e _2 s/ v / n is plotted for 
the same 1,3,5, and 7-point tests, in the weak-signal limit. 
The maxima of the eight curves is given in Table \ 



Thus we find a formula for the pdf of the optimal statistic 
r in the small-e limit: 



Since the pdfs on both sides are normalized, an important 
consequence of this is that the mean value of the test 
statistic in the absence of a signal is 



y roc 

M rr — \ r — — dr 
r=o 2 Jo 



tp(t). 



This is because the mean value of the likelihood function 
in the absence of a signal is unity. It's also easy to show 
that / °° a(T) dT = 2M rr . 

From this it is straightforward to calculate the false 
dismissal probability 



/3(0 



dr p{r\e) 



2 

4 

r=0 



l- 6 -M r ^j J dr P (t\0) + ie 2 J dTT P ( T \Q) 
e2 M r ^j (l-a(T)) 

c r (2X r - (T + 2A r )e- r/2A ^ 

dra(r) . 



1 - a(T) - — 



A bit of rearrangement gives us the weak-signal detection 
probability 7(7") = 1 — a{T) — (3(7~) as a function of the 
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1-pt 


3-pt 


5-pt 


7-pt 


Max = 


0.1424 


0.1465 


0.1477 


0.1483 


T = 


1.548 


1.863 


1.918 


1.942 


a — 


0.3679 


0.3739 


0.3767 


0.3775 


Max -v^= = 


0.3113 


0.3188 


0.3204 


0.3211 


T = 


2.467 


2.773 


2.821 


2.840 


Q = 


0.2031 


0.2093 


0.2121 


0.2135 



TABLE II: The maximum detection probability 7 and sig- 
nificance s of the optimal 2P + 1-point peak detection tests, 
for P = 0,1,2 and 3. These correspond to the curves of 
Fig. The top half of the table lists the maximum value 
of the detection probability 7 = 1 — a — 0, and the values of 
the threshold T and false alarm probability a for which that 
maximum is obtained. The bottom half of the table lists 
the maximum value of the significance s, and the values of 
the threshold T and false alarm probability a for which that 
maximum is obtained. 



threshold: 



2 7(T) 



2 V 2 
2 { 2 



1 2P 



M rr )a(T) + ^J dT a ( T ) 



Mr 



1 2P 

,)a(T) + - y^c r A r 



-T/2\ r 



r=0 



r 

— M rr + A, 



,-r/2A r 



(8.9) 



These formulae make it clear that 7 = 1 — a — f3 vanishes 
as T — > and as T — > 00. 

It is instructive t o re turn b riefl y to the P = (one- 
point) test. Eqns. (8/7) and (|8.9|) give false alarm and 
signal detection probabilities: 

ai(T) = e- T / 2Ao , and 



TiCO = 



1 — a± 



-Te~ T/2X ° 

Ao . 

ai Inai 

2 



These should be com pare d with th e resolved-frequency 
case, given in Eqns. ( |3.5| ) and (3.8). As expected, the 
formulae are identical if Ao = 1. However, for t he un- 
resolved frequency case of this Section, Eqn. (7.2) gives 
Ao ~ 0.773695. Hence the signal detection probability 
at a given false alarm probability a is lower than in the 
resolved-frequency case. 

1 2 

For a resolved signal 7 = --e ulna. 



For an unresolved signal 7 



-0.3868 e'alna. 



Thus, for weak signals, the detection probability of a one- 
point test for unresolved signals is 77% the probability 



of detection of a one-point test for resolved signals. This 
can also be seen by comparing the maxima of the 1-point 
detection probabilities shown in Figs. || and |^. 

For the first few values of P, the detection probability 
is given by 



■ 2 7im 

-W) 



2 7 5 (T) 



Te 
(T 
(T 
(T 
(T 



(T 



1.38629-0.646250 T 

1.17920— 0.638755 T 
2. 85039-3. 63507T 
8. 12445-46.4309 T 
1.13581—0.638380 T 
2.63902-3.03752 T 
8.22249-44.5006 T 
23. 0601-6121. 0T 
38.9580-325142.0 T 



0.29663) e 
1.58708) e 
1.84064) e 
0.35186) e 
(T - 1.58910) e 
(T- 1.89585) e 
1.91816) e 
1.91832) e 



where the subscript on 7 is IP + 1: the number of points 
used in the test. Fig. || shows the detection probability 
and significance as a function of false alarm probability 
a for the 1-, 3-, 5- and 7-point tests, for this case, where 
the signal frequency is uniformly distributed in the range 
S G ±1/2 a bin. It is clear from this Figure, and from Ta- 
ble |n| that while adding the additional information from 
the nearby frequency bins does improve the detection 
probability and significance slightly, the gain is relatively 
small. In practice, there is little to be gained from going 
beyond the 3- or 5-point tests, as can be seen by not- 
ing that the eigenvalues of M drop to small values very 
quickly with increasing P. This means that for sensible 
values of the threshold, the terms that they add to a and 
(3 have very small effects: the dominant terms are from 
the largest eigenvalues. 



IX. INTERPRETATION OF RESULTS AS 
FREQUENCY SPACE "INTERPOLATION" 

In this Section, the optimal statistic r of the previous 
Section is shown to have a simple intuitive interpretation: 
it is the total power contained in a continuous spectrum 



in the frequency range /, 



£-1/2 



<f<L 



The contin- 



uous spectrum is obtained from the discrete spectrum Xj 
via frequency-space interpolation . 

This frequency-space interpolation may be understood 
in terms of "zero-padding", as follows. 

• Start with the low-resolution f requ ency-domain 
Fourier amplitudes Xk defined by (2.2). Here, "low- 
resolution" indicates that the frequency spacing be- 
tween successive bins is 1/T. 



• Transform these into time-domain yj 
0,---,N-l. 



for 



J 



• Zero-pad the time-domain data to L times its orig- 
inal length N, by appending (L — 1)N zeros, for 
./ V.....A7. 1. 
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• Now transform back into the frequency-domain to 
get a higher- frequency-resolution set of Fourier am- 
plitudes Xk- Here "high-resolution" indicates that 
the frequency spacing between successive bins is 
l/LT. 

In the limit L — > oo this gives rise to a continuous spec- 
trum x(f). The optimal statistic r of the previous Sec- 
tion is exactly the signal power contained in this contin- 
uous spectrum in the range from fi-1/2 < f < fe+i/2- 
This quantity only depends on the Fourier amplitudes Xk 
because the zero padding has not added any information 
to the original data set 

To prove this assertion, we first derive a formula for the 
high-resolution DFT in terms of the lower-resolution one, 
following the procedure above. The Fourier am plitudes 
of the time-domain samples yj are given by (|]^) as 

2V-1 

x k = Vj e 2vi3k,N , for k=-N/2+i,-,N/2. (9.1) 
j=o 

The inverse relationship gives the time-domain samples 
in terms of the Fourier amplitudes as 



N/2 



Vk 



x i e~ 2wijk/N , for fc =o,-,JV-i. (9.2) 



j=-jV/2+l 



Zero-pad these time-domain samples by appending (L — 
1)N zeros, so that the total number of time-domain sam- 
ples is now NL. Taking this back into the frequency 
domain gives the high-resolution Fourier amplitudes (for 
k = -NL/2+l,---,NL/2) 



Xk 



NL—1 

N-l 

3=0 

N-l N/2 



j=0 r= -N/2+l 
N/2 h 

£Av(f-r> 

r=-N/2+l 



-2wijr/N 2-Kijk/NL 



(9.3) 



In the third line, we have c arri ed out the sum over j by 
using the geometric series (6.3). The last line is the de- 
sired result giving the high-resolution Fourier amplitudes 
x in terms of the low-resolution x's. The Dirichlet kernel 
Dn (6.5) is responsible for doing the interpolation. 



The high-resolution spectrum has exactly as many de- 
grees of freedom as the low-resolution spectrum, although 
it has L times as many frequency bins. This is because 
the amplitudes in the high-resolution spectrum are cor- 
related with each other. The high-resolution spectrum 
also contains an exact duplicate of the low-resolution 



spectrum. Since Djq vanishes for non-zero integer ar- 
guments, and Djv(O) = 1, every L'th high-resolution bin 
contains the same value as one of the low-resolution bins: 
XLr = x r for all integer r. 

To finish proving the assertion, we calculate the av- 
erage power in the high-resolution frequency bins k = 
L(l - 1/2), • • • , L{1 + 1/2) - 1. These L high-resolution 
bins cover the frequency range from fi—x/2 to fe+i/2> 
which is ±1/2 a bin around the £'th bin. Anticipating 
the final result, this quantity is denoted "r". It is 

L-l 

1 V— * I _ \2 

T = T \ x Ll-L/2+k\ 



k=0 
L-l 



fc=0 



N/2 
r=-JV/2+l 



k 1. 

r ) x r 

L 2 J r 



Since Dn(x) is peaked around x = 0, in the spirit of 
the previous Section, this may be approximated as the 
sum over the 2P + 1 bins around the £'th bin. Further 
justification can be found in Section |x| and in Fig. ||. This 
gives 



1 L ~ l 



L 



k=Q 



Ek 1 
Dn{— - ^ - r) X£ +r 

r=-P 



L 2 



(9.4) 



In the continuous limit, when the number of high res- 
olution frequency bins L — > oo, the outer sum can be 
converted into an integral over 5 — k/L — 1/2, giving 



1/2 

dS 

-1/2 



^ D N (S - r)x e+r 
=-p 

^ xe+ r S rr 'Xg_\_ r i . 



=-p 



Here, the matrix S rr i is a 2P + 1 dimensional Hermitian 
matrix defined by 



S r 



1/2 

d5 D N {S -r)D* N (6 -r'). (9.5) 

-1/2 



This equation shou ld be compared to the definition of 
L rr i given in Eqn. ( 3.15 ). Making the same large N ap- 
proximation as earlier gives 

S rr , = e l < r - r '^~^) ['dS J0 (ir(6 - r)) ]0 (ir(6 ~ r')) 

J -1/2 

« e i7r{r - r,) [ 1 d5j (K(6 + r))jo(*(S + r ')) 

J -1/2 



M r 



(9.6) 



Thus, the optimal statistic t of the previous Section is 
just the average power in a continuous interpolated spec- 
trum within a frequency band of width ±1/2 a bin around 
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X. WHY "WINDOWING" DOES NOT GIVE A 
BETTER TEST 

Windowing is a well-known method for reducing the 
bias in a power spectrum, particularly for frequencies 
that are not resolved. It is natural to ask if this technique 
might provide a better test than the Neyman-Pearson 
test. 

For large P (the number of bins used on either side 
of bin £) the answer is clearly "no". In this case, the 
Neyman-Pearson test is (by its very definition) the opti- 
mal test. However, if P is very small, one might wonder 
if windowing could provide a better test, or if for large 
P, windowing might provide a more efficient implemen- 
tation of the optimal Neyman-Pearson test. The reason 
is that in frequency space the amplitudes \xk\ fall off 
oc k^ 1 away from the peak. One might then wonder if 
windowing can "concentrate" more of the power close to 
the peak, to provide a better test when P has small val- 
ues. As we shall show, the answer to the question is still 
"no" even when P is small. 

"Windowing" is the process of multiplying the time- 
domain data yj by a time-domain window function Wj, 
then transforming the data into frequency space. Thus 
Uj — ► WjUj in (2.2). This is also referred to as "apodiz- 
ing" or "tapering" . Note: in addition, one may zero-pad 
the data set before taking it into the frequency-domain. 
But, as described in Section [X the optimal test already 



effectively does this, in the limit of infinite zero-padding. 

Common choices of windowing functions are given such 
names as "Hamming", "Parzen", "Welch" and so on. 
These window functions are are chosen for their prop- 
erties: quickest side-lobe falloff, narrowest -3db range, 
minimum spectral bias, and so on. As an example here, 
to explain why windowing the data first does not provide 
a better test, we take as a window function the cosine 
window 



2ttj 
N 



(10.1) 



The situation for other windowing functions is similar. 

The window function is normalized so that the total 
power in the spectrum is the same with or without the 
window. This is ensured by the condition (true for large 
N) 



N-l 

E 

3=0 



an 



N. 



(10.2) 



This condition ensure that for stationary noise, the sta- 
tistical properties of the noise in the frequency bins is the 
same with or without the windowing. Thus, for exam- 
ple, the expected power spectra of independent Gaussian- 
distributed time-domain samples (white Gaussian noise) 
are exactly the same for this window and for the rectan- 
gular window uij — I. 
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FIG. 6: The frequency-domain effects of windowing sinusoidal 
signals of amplitude e are shown in the absence of noise. The 
bottom graph uses a rectangular window w% = 1 (no win- 
dowing}. The top graph uses the cosine window defined by 
Eqn. (10.1). The solid curves show how the power \xk\ 2 is dis- 
tributed bin-by-bin around the peak at k = I, for five di ffer- 
ent frequencies defined by 8 = 0, —0.1, ■ ■ ■ , —0.5 in Eqns. (6.1- 
6.2). The dotted curve shows the average. Windowing greatly 



reduces the difference in \xt\ between resolved frequencies 
(<5 = 0) and unresolved frequencies, so it reduces the bias in 
a spectrum. However it also reduces the power in the peak 
substantially: the mean value is 0.60e 2 with windowing com- 
pared to 0.76e 2 without windowing. This means that win- 
dowing does not give a better test: at a given threshold T it 
yields a larger false dismissal probability. 



Sh own in Fig. || are the spectra of sinusoidal signals 
( |2.lD for the frequency bins near the peak. In the unwin- 
dowed case, a resolved signal (S = 0) has all its power in 
the £'th bin: \xi\ — e 2 . As the frequency shifts upwards 
to 5 = —0.5, the magnitude of \xe\ 2 drops to 0.40e 2 . The 
adjacent (t + l'th) bin also contains 40% of the energy. 
The remaining bins contain the other 20% of the energy, 
mostly in bins £—1 and £+2. The large magnitude of this 
ratio 1/0.40 = 2.5 is one reason why rectangular windows 
are often undesirable: a peak at a resolved frequency can 
be as much as a factor of 2.5 times higher than the same 
peak at an unresolved frequency. In contrast, in the win- 
dowed case, the magnitude of \xe\ 2 = 0.67e 2 when 6 = 
and only drops to \xi\ 2 = 0.48e 2 when 5 = —0.5. The 
ratio 0.67/0.48 = 1.38 is much smaller, hence the cosine 
window produces a less biased power spectrum than the 
rectangular window. 

But Fig. ^ also makes it clear why windowing does 
not result in a better test for sinusoidal signals buried 
in noise than the Neyman-Pearson test, even for small 
P. The reason is that windowing "broadens the peak" 
for signals that are near resolved frequency even more 
than it "sharpens the peak" for signals that are far 
from a resolved frequency. The dotted lines in Fig. ^ 
show the average power (averaged over the six values 
6 = 0, 0.1, • • • , 0.5. In the windowed case the average 
power in the peak is only 0.60e 2 compared to 0.76e 2 for 
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the unwindowed case. This reduction in peak power re- 
sults in a tremendous loss of significance for small e, when 
the signals are buried in noise. For a given value of the 
threshold T (corresponding to a fixed false-alarm proba- 
bility), the windowed signal is far less likely to cross the 
threshold when a signal is present than the non-windowed 
signal. Thus, it has a higher false dismissal probability 
than the Neyman-Pearson test. 

Fig. ^ also demonstrates that in the unwindowed case, 
almost all of the power is within a few bins of the peak. 
Consequently even small values of P will give a nearly- 
optimal test. For example even for the worst-case signal 
(6 = —0.5) over 92% of the power in contained in just 
the the range of bins from £ — 2 to i + 2. Averaging 
over S, these bins contain more than 96% of the signal 
power. When P is increased this rises rapidly: in the 
worse case (5 = —0.5) for P = 10, the 21 bins around the 
peak contain more than 98% of the total power. There is 
effectively nothing to be gained by increasing P to larger 
values. 



XI. OPTIMAL TESTS IN THE PRESENCE OF 
NON-GAUSSIAN NOISE 

Section [y] showed how the weak-signal assumption of 
small e permitted several useful simplifying approxima- 
tions. One important simplification was that the optimal 
statistical test does not depend upon the amplitude e. 

This same weak-signal assumption also makes it possi- 
ble to find the optimal statistical test for signals hidden 
in certain ty pes of non-Gaussian noise as described, for 
4|. Consider the following generaliza- 



example, in 
tion for the pdf (6. 



p{x\e) 



(11.1) 



1/2 

dS 

-1/2 



2n 11 



k=-P 



2-KSk 



The Gaussian case treated in Section VI is a special case 
of this, for which gk{%) = % and Sk = 1- These types 
of non-Gaussian noise models, and the methods that are 
being used here (locally optimal tests) are discussed in 
more detail in jl3|, [l4| , where they are used to construct 
optimal search techniques for stochastic background de- 
tection and for matched filtering. 

This form of the pdf assumes that the noise in the dif- 
ferent frequency bins is independent, but it allows each 
bin to have its own, different, arbitrary statistical distri- 
bution. For example, this can describe a very common 
situation, where the pdf has a central Gaussian region, 
plus a non-Gaussian tail. Typically there is a "knee" at 
some characteristic signal amplitude, where the slope of 
the distribution changes, or the non-Gaussian tail begins. 
Some preliminary work [ fL5[ has shown that it is straight- 
forward to approximate these functions given a real data 
stream. 





FIG. 7: An example of a function g(x) corres pond ing to 
non-Gaussian statistical behavior, given by Eqn. ( 11. 2| ) with 



20 and p — 0.999. Notice that in the centra" 



Gaussian 



region, g'(x) « 1, whereas g'(x) — > o~ when the argument x 
is larger than w a 2 12. The dotted line in the bottom graph 
shows (for comparison purposes) g(x) = x. 



Th e fun ctions gk are not completely arbitrary. In order 
that (11.1) be properly normalized, one must have 



du e - 9fe(u) = 1. 



For any functional form of g, this can be satisfied by 
adding the correct constant term to g. We also require 
that g satisfy the additional normalization condition 



duue- 9k{u) = 1, 



which can always be satisfied by re-scaling the argument 
of g. One then has 



dx p(x\0)x* k x r — 25k r Sk, 



so the positive weights Sk can be interpreted as the mean- 
squared noise power in the /c'th frequ ency bin. This for- 
mula should be compared with Eqn. (3.4). 
For example one might have 



e~ 9 ^ = k 



p e 



o/a 2 



(11.2) 



where k = p + (1 — p)o 2 ■ Here we assume that p is 
positive and less than unity. The cases of most interest 
are when 1—p is very small, and a 1 is large, so that itsil, 
Shown in Fig. ^ is a graph of g(x) and g'(x) for the case 
where p = 0.999 and a 2 = 20. This corresponds to a 
case where 99.9% of the data is described by a Gaussian 
distribution with unit variance. The other 0.1% of the 
data samples are outlier points, described by different 
Gaussian distribution with a variance of 20. 
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It is straightforward to derive the optimal peak- 
detection statistic in the weak signal limit, by proceeding 
exactly as in the Gaussian case of Section VI. We write 



p(x\e) 



1/2 r 2n ji 

dS / ^e ff W, 

-1/2 JQ 2-7T 



(11.3) 



where W(e) 



E 

k=-P 



-9k 



\x k +e - eujjk + S)e^\ 
2S k 



— In 2itSk 



(11.4) 

As before, it's easy to see that p'(x\e) vanishes at e = 0. 
So the first non-vanishing derivative is 



J -1/2 Jo 27r 



p{x\Q) 

The derivatives of W that appear are: 



(W'(0)) 2 + W"(0) . (11.5) 



W'(Q) 
and 
W"(0) 



k=-p 



9k 



\xk+e\ 
2S k 



M{x* k+ Mk + sy*) 



E 

fc=-p 



25fe 



jg + (5)e^) 



-i 2 



„ / \x k+t \ z 
9k l~2ST 



where g' k and g' k ' are the first and second derivatives of 
the function g k w.r.t. its argume nts. Using (6.13) to 
evaluate the integral over 4>, and fl6.15|) to evaluate the 
integral over 6 gives 

P"(x\0) 
p(x\0) 



9'k (^f ) 9'r (^) 



- E 

k,r=-P 

P II ( \x k+l \ 2 
1 x - 9k { 2S k J , 

~9 1^ 52 M kk \x k+e \ 



x* k+l ,M kr x r+ i 



k=-p 
p 
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k=-P 



, / W/l 3 > \ 



9/v 



5V 



M fcfc 



(11.6) 



A good algebraic check is to verify that in the absence of 
a signal the mean value of this quantity vanishes. 

Thus we arrive at the final result: the optimal weak- 
signal detection statistic in the non-Gaussian case. Leav- 
ing out the data-independent constant term, it is 
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E 

k,r=-P 



flk 



2S k 



fir 



2S r 



S k S r 



x* k+e M kr x r +e 



E oT ' M kk \ Xk+e f 

k=-P 



SI 



.1 



(11.7) 



in the Gaus- 



This reduces to the original expression 
sian case, where g' = 1 and g" = 0. In the non-Gaussian 
case (refer to Fig. [?]) the effect of the g' and g" terms is 
to "clip" or "truncate" the effects of outlier points. 
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